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Abstract 

Since the inception of genetic algorithmics the identification of computational efficien- 
cies of the simple genetic algorithm (SGA) has been an important goal. In this paper we 
distinguish between a computational competency of the SGA — an efficient, but narrow com- 
putational ability — and a computational proficiency of the SGA — a computational ability 
that is both efficient and broad. Till date, attempts to deduce a computational proficiency of 
the SGA have been unsuccessful. It may, however, be possible to inductively infer a compu- 
tational proficiency of the SGA from a set of related computational competencies that have 
been deduced. With this in mind we deduce two computational competencies of the SGA. 
These competencies, when considered together, point toward a remarkable computational 
proficiency of the SGA. This proficiency is pertinent to a general problem that is closely 
related to a well-known statistical problem at the cutting edge of computational genetics. 

1 Introduction 

When applied to combinatorial optimization problems that are poorly understood, or known to 
be NP-Hard, simple genetic algorithms (SGAs) frequently evolve usable solutions after evaluat- 
ing a relatively small number of samples. At a general level it seems reasonable to presume that 
SGAs owe their adaptive prowess to a capacity for performing one or more kinds of computation 
relatively efficiently, i.e. robustly and scalably relative to other known algorithms. The iden- 
tification of such computational efficiencies — the source of the remarkable adaptive capacity of 
SGAs — has been a goal since the inception of the field of genetic algorithmics. 

The early genetic algorithmics literature (pre 1990s) is marked by spirited debates about 
the purported efficiency with which SGAs "process" schemata (for a review see |19| pll9-127], 
[§1 pll4-119],p8[ p74-78]). Claims about the computational efficiency of SGAs made during 
this period were supported almost entirely by theoretical arguments, or to be more precise, 
quasi-theoretical arguments — though these arguments made use of mathematics, i.e. they were 
deductive in parts, the amount of a priori and a posteriori speculation involved was substantial. 
These claims were rarely, if ever, backed up by empirical support. To the best of our knowledge 



no bold claim about the computational efficiency of the SGA made during this period has been 
empirically supported in a convincing fashion. 

Instead, when empirical tests were conducted, a large gap was discovered between the claimed 
efficiency of the SGA, and it's actual performance. The Royal Roads experiments |20| [9] in 
particular proved to be a watershed in the history of genetic algorithmics. On a class of fitness 
functions called Royal Roads that were tailor-made to play to a much vaunted computational 
efficiency of the genetic algorithm — its capacity for building block identification and composition 
in parallel — a random mutation hillclimber seemed to need far fewer fitness evaluations to find 
the global optimum [9l|T8]. Following the publication of these results, many researchers declared 
the SGA inefficient, and began inventing algorithms geared towards the explicit implementation 
of the abstract process described in the building block hypothesis [131 P 55] [151 HSl |27] . 

The publication of the Royal Roads experiments was the first of two major developments 
that seem to have cooled the search for computational efficiencies of the SGA. The second 
development was the publication of the no free lunch (NFL) theorems for optimization |33j . 
After deriving one of the the most general NFL theorems obtained till date, Igel and Toussaint 
|16] conclude that in all likelihood NFL does not apply to the practice of black-box optimization. 
This conclusion certainly seems to be borne out by the empirical record; in the practice of black- 
box optimization, free lunches seem to be aplenty. Unfortunately, for much too long, the NFL 
theorem has subliminally, if not overtly, been regarded as "proof that the opposite is true. 
This belief entails that when an SGA outperforms random search (or for that matter, pick-the- 
worst-neighbor search) on some black-box optimization problem in practice, it does so because 
of a fortuitous pairing between problem and algorithm. If this is so, then any advantage over 
random search that accrues when a genetic algorithm is used for black-box optimization in 
practice can be ascribed to fortune rather than to some innate computational efficiency of the 
genetic algorithm. 

Through both of the aforementioned developments, the simple genetic algorithm has contin- 
ued to be frequently, and successfully used to perform black-box optimization in fields ranging 
from finance to operations research to electrical engineering. The identification of one or more 
computational efficiencies of the SGA still promises to help us understand the reason for it's 
success. 

1.1 Computational Competencies and Proficiencies 

To demonstrate an efficiency of some computational system one typically shows that the system 
can scalably and robustly solve some problem — typically the problem that the system was ex- 
plicitly designed to solve. But what about computational systems that aren't explicitly designed 
to solve some well-defined problem (e.g. brains, genetic algorithms)? While we may conjecture 
that some of these systems are capable of efficient computation, expressing this sentiment rig- 
orously is far from simple. A fundamental difficulty lies in identifying a general problem such 
that one can derive impressive bounds on the computational complexity of the system when it 
is applied to the problem. 

Suppose, instead, one succeeds in identifying a set of specific problems for which impressive 
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computational bounds can be derived. Even if the problems in this set are highly specific, it 
may be possible, by noting similarities between the problems, to intuitively infer the outlines 
of a more general problem that the system can tackle efficiently. To distinguish between a 
computational system's ability to efficiently tackle some vaguely defined general problem, and 
it's proven ability to efficiently solve some well-defined specific problem, we refer to the former 
ability as a computational proficiency, and the latter ability as a computational competency. 

In this paper we identify two computational competencies of the SGA. That is, we identify 
two specific problems, and show that the SGA can solve each problem very efficiently — certainly 
more efficiently than the "mainstream" computational technique for solving these problems. 
When these two competencies are considered together, they point unmistakably towards a pow- 
erful computational proficiency of the SGA. Remarkably, the general problem that this profi- 
ciency is concerned with is closely related to a well-known statistical problem at the cutting 
edge of computational genetics having to do with the identification of epistatically interacting 
quantitative trait loci (QTLs). 

1.2 Epistatically Interacting QTLs 

Consider a phenotypic trait for which there exists a single polymorphic locu^ such that allele 
substitutions at this locus result in large changes in the phenotypic trait. Many such traits 
have been identified (e.g. seed color in pea-plants, eye color in fruit flies, presence of sickle cell 
anemia, presence of cystic fibrosis). In most cases, however, changes in a phenotypic trait are 
more fine-grained, and are influenced by allele substitutions at several polymorphic loci. Such 
traits are called complex or quantitative, and the loci that influence them are called quantitative 
trait loci. An important goal of modern genetics is the identification of quantitative trait loci 
for traits of interest, e.g. the oil content of corn seeds |141 pl64], grain weight in rice plants |34j . 
and of course, susceptibility to common diseases with complex genetic underpinnings (cancer, 
diabetes, schizophrenia etc.) 

A popular technique for identifying loci that affect quantitative traits is called genome scan- 
ning. Given the genomes of a set of individuals and the corresponding values of a particular 
quantitative trait, genomic loci are visited one by one to determine which loci have a statistically 
significant effect on the trait when averaged over all other loci. Geneticists distinguish between 
the main effect of a locus and its interaction effect with other loci. Frankel and Shork |lL)j 
distinguish between the two as follows: 

"A main effect is the average effect of a [locus] taken over all other [loci] . Main effects 
ultimately emerge when one is studying, or mapping, a [locus] either in isolation 
or without regard to other [loci]. Interaction effects are those attributable to the 
simultaneous influence of two or more [loci]. Most contemporary data analysis and 
statistical modeling strategies for genome scan investigations assess the significance 
of only the main effects of potential trait loci." . 

locus with multiple alleles 
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Table 1: Marginal values of two (top table) and three (bottom table) bi-allelic interacting loci. 
None of the loci have main effects 

Frankel and Shork then eloquently explain why interaction effects have not received much 
attention, and point out the peril of concentrating solely on main effects: 

"There are, of course, many scientific reasons which in part account for this main 
effect 'bias' and these reasons all derive from difficulties surrounding the statistical 
treatment of epistatic effects . . . Given these difficulties, it is easy to see why epistatic 
effects have been neglected in favor of main effects in complex trait analysis inves- 
tigations. Unfortunately, however, there exists the possibility that a [locus's] effect 
might only be detected within a framework that accommodates epistasis. Thus, for 
example, a [locus's] true main effect might be too small to detect with any reasonable 
statistical power and sample size, and yet it might enter into a critical epistatic effect 
with a second [locus]." [10] 

It is easy to see how a group of loci can interact even though no locus in this group has 
a main effect. Table [l|a) shows how this might happen when two loci A and B interact (for 
the sake of simplicity we have assumed bi-allelic haploid genomes). Note how neither of these 
loci have main effects (the marginal value of each allele of each locus is zero) even though they 
clearly influence the trait in question. Table [l]^b) shows how three bi-allelic loci A, B, and C 
can interact epistatically on a trait, yet have no main effect. The reader can check that the 
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the marginal value of each allele of each locus is zero.) In fact for any non-empty set of loci 
{^1, . . . , An}, and any set of bits {61, . . . , one can construct a similar table by letting the 
marginal values of two genotypes 61 ... 6^, and 61 ... 6„ be some value 6 and letting the marginal 
values of all other genotypes be — 2^-2 (This observation will come in handy in our definition 
of type 1 pivotal functions in section |3]). 

If loci that interact also have statistically significant main effects then these loci will be 
detected by genome wide scans for main effects. Once detected, the interactions between the 
loci can be mapped. If, however, loci that interactively influence a quantitative trait have no 
main effects (or if their their main effects are statistically insignificant) then, as Frankel and 
Shork have explained, one will not detect such loci unless one explicitly uses an investigative 
technique that "accommodates epistasis" . 

Main effects can be detected by visiting loci one at a time and testing for differentiated 
marginals (marginals with non-zero marginal values). Let us call this strategy differentiated 
marginal testing (DMT). To the best of our knowledge, the only sure way to accommodate for 
epistasis between loci when main effects are absent, is to visit multi-locus combinations, and 
to test the (multivariable) marginal of each such combination for differentiation. We shall call 
this approach combinatorial differentiated marginal testing, or combinatorial DMT for short. 
The computational intractability of combinatorial DMT, even for small combination sizes, is 
discussed in a recent article by Moore [H]. Moore remarks: 

"Identifying the optimal combination of [loci] from an astronomical number of possi- 
ble combinations is computationally infeasible, especially when the [loci] do not have 
independent [i.e. main] effects. The following example illustrates the computational 
magnitude of the problem. Let's assume that 10^ [loci] have been measured. Let's 
also assume that 1,000 computational evaluations can be completed in one second 
on a single processor and that 1,000 processors are available for use. Exhaustively 
evaluating all of the approximately 4.9 x 10^^ two-way combinations of [loci] would 
require approximately 5.7 days. Exhaustively evaluating all of the approximately 
1.6 X 10^"^ three -way combinations of [loci] would require 1,929,007 years. This of 
course assumes a best-case scenario in which the genetic model of interest consists 
of only two or three important attributes or genetic variations." 

The problem described above (see also [23], and |22]) is a specific instance of the general problem 
of identifying interacting attributes in data-mining |TT] . 

In this paper we focus on generative versions of two specific problems having to do with the 
identification of interacting- attributes. Crucially, main effects are entirely absent in both prob- 
lems. By "generative" we mean that the value of any synthesized data point can be queried — like 
in active learning. The first problem can only be solved by a combinatorial DMT strategy that 
tests attributes in combinations of two or more. The running time of such a strategy is therefore 
quadratic in the number of attributes. The second problem can only be solved by a combina- 
torial DMT strategy that tests attributes in combinations of four or more; the time required 
by this strategy is therefore r2(£^), where £ is the number of attributes of an instance of the 
problem. We will show that both the first and the second problem can be solved robustly by 
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an SGA in time that is linear with respect to the number of attributes. Moreover, we will show 
that in both cases, the query complexitjj^ of the SGA is constant with respect to the number of 
attributes. 



2 Our Mode of Analysis 

This paper is somewhat unusual as foundational studies of genetic algorithms go in that experi- 
ments play a primary role in our mode analysis. Experiments are typically used in foundational 
GA research either to confirm behavior predicted by formal models, or to draw attention to 
phenomena not predicted by prevailing theories (e.g. f29l [201 El IB])- The use of experiments as 
a primary tool of analysis is, however, typically avoided because of the problem of specificity. 

One can identify two kinds of specificity. First, as GAs are stochastic processes, any ob- 
servations about the behavior of a GA during some run are, strictly speaking, only valid for 
the integer used to seed the random number generator. Of course, one can easily circumvent 
this problem by running the GA several times with different seeds. Doing so allows one to 
build confidence that observed effects are not artifacts of some random seed. In most cases it is 
straightforward to quantify this confidence using statistics. 

The second kind of specificity is more problematic. Strictly speaking, an experimental result 
only pertains to the parameter values used in the experiment. In practice it may be possible, 
by changing a parameter while holding all others constant, to glean the relationship between 
that parameter and some aspect of GA behavior. However, if our aim is to be rigorous, then 
the extrapolation involved in this approach is less than ideal. In this paper we circumvent 
the problem with the second kind of specificity by exploiting the symmetries of the SGAs we 
construct. By doing we obtain hard quantitative results from a single experiment for an infinite 
set of problem instances. 

Symmetry arguments |31l [T7] . while not new to GA research (for a previous instance see 
[2]), are not common either. Such arguments are more frequently used in physics and chemistry. 
Indeed according to the theoretical physicist E. T. Jaynes "almost the only known exact results 
in atomic and nuclear structure are those which we can deduce by symmetry arguments, using 
the methods of group theory" [T71 p331-332] . 

One does not, however, need to venture so far afield in order to find an example of a symmetry 
argument. Let 03^ denote the set of bitstrings of length n. For any bitstring g, let g denote 
the bitwise complement of g (for example, 1011 = 0100). Let i be some positive integer, and 
let / be some fitness function over ^8^ such that for any bitstring g £ iB^, f{g) = f(jj) (for 
example, if ^ = 4 and /(lOll) = 2.75, then /(OlOO) = 2.75). Let G be some finite population 

^Because fitness evaluation is by far the most time consuming part of a typical GA run, genetic algorithmicists 
often use the term time complexity to refer to the relationship between a parameter of some problem and the 
number of fitness evaluations required by a GA to solve the problem. In this paper we use the term query complexity 
instead. Our usage of this term is in line it's usage in theoretical computer science (the fitness function of a GA 
can be thought of as an oracle that gets "queried" by the GA). We use the term time complexity as it is typically 
used in computer science — to refer to the relationship between a problem parameter and the number of "basic 
steps" required to solve the problem. 
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SGA with fitness function /, such that the initial generation of G is drawn from the uniform 
distribution over the set iB^. For any generation t, let p^^\g) denote the probability that some 
bitstring g £ will be in the population of G in generation t. Then by appreciating the 
symmetry of the situation, we can deduce that for any generation t, and any bitstring g G 03^, 
pW(5)=pW(5). 

This result holds regardless of the size of the population, the mutation and crossover rates, 
the mutation and crossover operators used, and the way in which the SGA G scales the fitness 
values of individuals (if it does) and performs selection. Since mathematical models of genetic 
algorithms with finite populations (e.g. [25]) tend to be unwieldy, a formal proof of the above, i.e. 
a proof within some formal axiomatic system, would be quite involved and would be relatively 
inaccessible. "The great power of symmetry arguments lies just in the fact that they are not 
deterred by any amount of complication in the details" , writes Jaynes |17l p331] . Symmetry 
arguments, in other words, allow one to cut through complications that might hobble other 
modes of argument. 

Jaynes stresses, as do we, that symmetry arguments rely not on 'equal ignorance', but on 
'positive knowledge of symmetry\ For instance, going back to the example above, if we cannot 
be sure that the initial population of G is drawn from the uniform distribution over ^B^, then for 
any generation t, and any bitstring g^ we would, in a sense, be 'equally ignorant' of the values 
of p(*)((yr) and p^^\g). Our 'equal ignorance' does not, of course, entail that these two values are 
the same. 

But what constitutes 'positive knowledge of symmetry'? This question, like the question 
"what constitutes beauty?" , has no direct answer. Historically, the appreciation of a symmetry 
by a community of theorists well versed in "the art" was enough to constitute 'positive knowledge' 
of that symmetry. Interestingly, over the last two centuries, mathematicians have largely agreed 
to eschew all non-sentential symmetries in their formal communication; these days, only certain 
types of symmetry between sentential forms are acknowledged. At the beginning of this deep 
shift in the communication of mathematics, Euclidian geometry was one of the only fields of 
mathematics that had an axiomatic foundation [24J. By the end of the shift, "new as well old 
branches of mathematics . . . were supplied with what appeared to be adequate sets of axioms" 
|24] . An obvious benefit of this shift is a reduction in the number of mistakes communicated. 
The rarely acknowledged cost is the impedance of timely and accessible communication of results 
derived through the insightful exploitation of non-sentential forms of symmetry. 

3 Type 1 Pivotal Functions 

We begin by defining a class of fitness functions with bitstring inputs such that when any of 
these functions is queried with an infinite set of samples drawn from a uniform distribution 
over it's domain, no locus has a main effect, even though some loci may interact epistatically 
with others. For reasons that will soon become clear we call the members of this class pivotal 
functions. 

For any positive integer n, let [n] denote the set of positive integers {!,..., n}. For any 
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n-tuple X and any i S [n] let xi denote the i*^ element of x. For any bitstring s let Sj denote 
the i^^ symbol of s. For any bit h let h denote the complement of b. Let AA(^, o"^) denote the 
normal distribution with mean ^ and variance a^. 

Definition 1. Let ip = {o,a,6,£, L,V) be a 6-tuple such that o is a positive integer, a and 5 
are non-negative real numbers, i is a positive integer greater than a, V is an o-tuple of binary 
values, and L is an o-tuple of distinct positive integers in [£] sorted in ascending order. A 
type 1 pivotal function with descriptor tp is a stochastic function f over the set of bitstrings of 
length i which behaves as follows: for any input bitstring g, if {g^^ = Vi) A ... A {gi^ = Vq) or 
{gL-i = Vi) A . . . A {gLo — ^o) then f returns a value drawn from M{6, cr^), otherwise f returns 
a value drawn from Ar(— gli^, o"^). 

We call o, 6, a, i and V the order, increment, noisiness, and span of a pivotal function 
respectively. When a pivotal function is queried with some bitstring, the distribution from 
which the result is drawn pivots upon the values of the bitstring at the pivotal loci given by 
L, and the pivotal values given by V — hence the name pivotal function. If we assume that a 
type 1 pivotal function is queried with samples drawn from the uniform distribution over the 
function's domain, then the expected marginal value of each allele of any individual locus is zero. 
This is what we mean when we say that no locus has a main effect. The increment parameter 
5 determines the strength of the expected multilocus marginal values of the pivotal loci. 

Let / be a type 1 pivotal function with descriptor(o = 3, a = 1, 5 = 0.18, £, L, V). The pdfs 
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a ) are shown in Figure 1 Consider the task of robustly recovering 
the indices of the pivotal loci (i.e. the values of Lj given only the values of o, a, d and i, and 
query access to the function /. Because of the stochastic nature of /, as long as there is any 
overlap between the two pdfs there will always be some probability of error. The large overlap 
between the two pdfs shown in Figure [T] make the minimization of this error expensive as i gets 
large (say 10^). But it is the absence of main effects that really makes this problem thorny. A 
DMT strategy that visits loci one-by-one clearly will not work because none of the loci have 
main effects. Such a strategy only begins to hold promise if loci are visited in combinations of 
two or more. The number of such combinations however scales at least quadratically with i. 

We will show that an SGA with uniform crossover can identify the pivotal loci of / relatively 
robustly (with less than a 0.005 chance of misclassification per locus) in time that is linear in i, 
and with some number of queries that is constant with respect to i. 



3.1 Symmetry Analysis 

For our purposes a semi-parameterized SGA is an SGA with just two "parameters": a positive 
integer i which specifies the length of the genomes, and a fitness function over the set of bitstrings 
*8^. Two semi-parameterized SGAs are considered to be distinct if they differ in their crossover 
rates, say, or in the selection schemes that they use. It is important to clarify that the per-bit 
mutation rate of a semi-parameterized SGA is not dependent on the genome length. For any 
positive integer i, any semi-parameterized SGA G and any fitness function / over *B^, we use 
the "oracle notation" of theoretical computer science to denote the (unparameterized) SGA that 
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Figure 1: The pdfs of two normal distributions with standard deviation 1, and means -0.06 
(grey) and 0.18 (black) 



results when the length of the bitstrings of G is fixed at t, and the fitness function that G queries 
is set to /; specifically, we denote this SGA by G^ . 

For any positive integer n let T„ denote the set {0, . . . , 1}. Let us call the frequency 
of I's and O's in a population at some locus k in some generation t the one-frequency and 
zero-frequency respectively of locus k in generation t. For any unparameterizcd SGA A with 



population size A'", let 



X e T 



N, 



At) 



: Tat ^ [0, 1] be a probability mass function such that, for any 
(x) is the probability that the one-frequency of locus i after t generations of 



running A is x. Likewise let j) : Tjv — >■ [0, 1] be a probability mass function such that, for 

any x € T^r, o|^^^(a;) is the probability that the zero-frequency of locus i after t generations 
of running A is x. We call such distributions one- and zero-frequency distributions. Finally, 
let 'fl'l^j^ and be random variables that give the one- and zero- frequencies, respectively, 

of locus i in generation t. Clearly then, the probability mass functions of fr|^j), and JJ-|^j) 



are 1/*] and 0^*"* 



respectively. Note that for any x G T^r, o[^j)(^) = ^(Ai)(^ ~ 



AA,i) 



(^) = ofi)(i 



{A,i) 



Proposition 1. Let G be a semi-parameterized SGA, and let f be a type 1 pivotal function. 
Then for any locus k of G^ , and for any generation t 



it) 



■^iGf,k) ~ ^{Gf,k) 
(b)^[t$f,k)]='^[^tkk) 
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Argument: For any generation t, and any locus k, Part (a) follows by consideration of 
the symmetry between and 0^^^ induced by the type 1 pivotal function /. Part (b) 

follows from part (a) and the claim that for any unparameterized SGA A, any locus i of A, and 
for any generation t, E[ 1T|^ j\ ] + E[ JJ-|^ ] = 1- For a proof of the claim note that if is 
the size of the population of G, then for any generation t, and for any locus i, 

So, 

E 1(1) (-)^+ E ^tUy)y 
= E i(l)(-)-+0{l)(i-^)(i-^) 

= E 

= 1 

Note that proposition [T] holds for any type 1 pivotal function, and a semi-parameterized 
SGA with any population size, any commonly used selection operator (e.g. rank, tournament, 
fitness proportional etc.), any of the typical crossover and mutation operators, any mutation 
and crossover rates, and any fitness scaling scheme. Imagine having to prove all of this without 
appealing to the symmetries of SGAs with type 1 pivotal functions. 

Uniform crossover [Ij, if present, adds yet another exploitable symmetry. This form of 
crossover was popularized by Syswerda [29] , who showed that uniform crossover can outperform 
one-point and two-point crossover on problems ranging from simple (e.g. one max) to complex 
(the travelling salesperson problem). A large amount of evidence for the practical utility of 
uniform crossover has since accumulated. Syswerda also observed that any homologous crossover 
operation can be represented by a probability distribution over the set of binary masks. Only 
in the case of uniform crossover, however, can the mask of a crossover operation be given by a 
string of independent identically distributed random binary variables. This absence of positional 
bias |7j is a crucial property of uniform crossover that we will exploit forthwith. For the sake of 
brevity we call an SGA with uniform crossover a UGA. 

Definition 2. Let f be a type 1 pivotal function with descriptor (o, 5, u, ^, L, V). Then the basic 
form of f is a type 1 pivotal function with descriptor (o, 5, o", o -|- 1, (1, . . . , o), (1, . . . , 1)). 

According to this definition if some function /* is the basic form of some type 1 pivotal 
function / with order o then the span of /* is a + 1. The first o loci of any input to /* will be 
pivotal, the last locus of any input to /* will be non-pivotal, and the pivotal values used by /* 
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are all ones, We say that a type 1 pivotal function / with descriptor (o, 6, a, L, V) is basic if the 
basic form of / is f. Since the last three elements of the descriptor of / are then derivable from 
the first three, we write this descriptor as (o, 6, a). 

Proposition 2. Let U be a semi-parameterized UGA, let f be a type 1 pivotal function with 
descriptor {o,5,a,l,L,V), and let f* be the basic form of f. Then, for any generation t, 

(a) For any pivotal locus k, = 1^^/* 

(b) For any non-pivotal locus k, = ^^-^^ 

In other words the one-frequency distribution of any pivotal locus of in some generation 
t is same as the one-frequency distribution of the first locus oi in generation t, and the 
one-frequency distribution of any non-pivotal locus of in generation t is the same as the 
one-frequency distribution of the last locus of Uf in generation t. 

Argument: Let /' be a type 1 pivotal function with descriptor (o, (5, cj, L, (1, . . . , 1)) . We 
shortly present four claims. Part (a) of proposition [2] follows from claim [TVa), claim [2] claim |4] 
and proposition [Tj^a). Part (b) of the above proposition follows from claim [ij^b) and claim [sj 

Claim 1. For any generation t, we have the following: 

(a) For any i e [a], if Vi = I, then = otherwise = 0(t//,L,)- 

(b) For any generation t, and any non-pivotal locus k ofU^ , = ^-j 

Claim 2. For any generation t, and any i G [a], 

At) _ 1 {*) 

Claim 3. For any generation t, and any non-pivotal locus k ofU^, 1^^/* = 

Claim 4. For any generation t, 

-lit) _ 1 {*) _ _ 1 W 

{Uf*,l) {C//*,2) •■• iUf*,o) 

Claim 1 follows from the observation that in any generation the population of U'^ can be 
"changed into" the population oiU^ and vice versa by a simple 1 relabeling of all genomic 
bits at those pivotal loci of U whose corresponding pivotal values are 0. 

Claims [2] follows by consideration of the symmetry between loci Li, . . . ,Lo of U^' and loci 
1 , . . . , o of Jjf* respectively. Claim [s] follows by consideration of the symmetry between any 
non-pivotal locus of and locus o -|- 1 of U^* . These symmetries follows from the absence of 
positional bias in uniform crossover and from the definition of the ladder functions /* and /'. 
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(a) "Vertical view" of a hypothetical population. Each row is a genome 





1 








1 








1 


1 


1 





1 





1 


1 








1 





ni = 


Xi 


X2 


^3 


Xi 


X5 


Xe 


Xj 


^8 


X9 




Xn 


X12 


-^^13 


Xi4 


Xie 


^16 


Xn 


^18 


y = 


1 


1 








1 





1 











1 


1 





1 





1 





1 


z = 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 



(b) "Vertical view" of a hypothetical crossover operation 



Figure 2: Subfigure (a) shows a "vertical view" of a hypothetical population. Each row is a 
genome. The shaded columns show the positions of three hypothetical pivotal loci. By the 
definition of a type 1 pivotal function (see text), only the bits in the shaded columns matter 
during selection. Subfigure (b) shows a "vertical view" of a hypothetical crossover operation. 
Two parents, x and y are about to undergo uniform crossover which will yield a child z. The 
bits of z will be determined by the values of the independent identically distributed random 
binary variables that comprise the mask m. 



To make these symmetries manifest, we offer the two "vertical views" shown in Figure [2j 
Figure 2(a) shows a hypothetical population of U-^ with three pivotal loci (whose locations are 
marked by shaded columns). Given the definition of the fitness function of C/-^ , it is easy to 
see that the fitness of any genome depends only upon the value of that genome's bits at the 
pivotal loci. Thus only the bits in the shaded columns of Figure 2(a) matter in determining a 
genome's fitness, and by extension its chance of being selected (note that this is true regardless 
of the selection scheme used). Figure 2(b) shows a "vertical view" of a hypothetical uniform 
crossover operation in Uf. Two genomes, x and y, have been selected for uniform crossover. 
The crossover mask m is represented as a string of random binary variables. The values of these 
variables determines the bits of the child z. Because crossover is uniform, the random variables 
in m are independent and identically distributed. 

Claim [4] follows from the symmetry that exists between each of the first o loci of Uf*. 
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This symmetry follows from the absence of positional bias in uniform crossover, and from the 
definition of the fitness function used hy . □ 

By proposition |2|a), for any pivotal locus k of U-^ , drawing monte-carlo samples from 

is equivalent to drawing monte-carlo samples from And by proposition |2|b) , for any 

non- pivotal locus k of U-f , drawing monte-carlo samples from is equivalent to drawing 



{uf,k)- 



monte-carlo samples from 1 
3.2 Experiment 1 

Let W denote the semi-parameterized UGA described in the materials and methods section in 
the appendix, and let /f denote a basic type 1 pivotal function with descriptor (o = 3, 5 = 
0.18, a = 1). Figure g shows the one-frequency dynamics of the first and last locus of ?7-^* in 
each of 3000 runs. In all 3000 runs the first locus went to fixatiorj^ by the 200th generation, 
whereas the one-frequency of the last locus in the 200th generation was always between 0.93 
and 0.07. 

In order to clearly describe the rest of our findings we develop the following notation. We 
denote a schema partition [19 by a tuple consisting of the indices of the defining positions of 
that schema partition — e.g. (2,15,3). The order of a schema partition T, denoted by o(r), is 
the number of elements in some tuple that denotes T. The denotation of a schema is dependent 
on the denotation of the schema-partition that the schema belongs to. For any genome g, let 
gi denote the i^^ bit of g. Given a schema partition denoted by some tuple F, the schemata in 
this partition are denoted by binary strings of length o(F). Let 6i, . . . , 6o(r) be some bits. Then, 
bi . . . 6o(r) denotes the schema consisting of the genomes {g\gi = 6i A . . . A 5'o(r) = &o(r)}- The 
denotation of the relevant schema partition must always be borne in mind when interpreting a 
denoted schema.. 

In addition to the findings reported above, we found that 200 generations into each run of 
Wf*, either the schema 000, or the schema 111, of the schema partition (1, 2, 3), dominated the 
population. The average fraction of the population that belonged to the dominant schema at 
the end of 200 generations was 0.9563 (with standard error 1.56 x 10"'^). 

Given the conclusions of our symmetry analysis, the result shown in Figure |3] provides us 
with a window into the frequency dynamics of any UGA W-l"^, where /i is a type 1 pivotal 
function whose basic form is /*. We infer that the pivotal loci of W-^'^ will tend to go to fixation 
by the 200th generation. We also infer that the divergence from 0.5 of the one-frequencies of 
the non-pivotal loci of W-^'^ will tend not to be not extreme. 

We now explain the behavior of W-^'^ that we have just deduced. Note that while this discus- 
sion is speculative and imprecise, it is entirely tangential to our aim of identifying computational 
competencies of the SGA. The one-frequency dynamics of the non-pivotal loci of W-^'^ is easily 
explained by the notion of drift. To understand the frequency dynamics of the pivotal loci, it 



''We use the term 'fixation' loosely. Clearly, as long as the mutation rate is non zero, no locus can ever be said 
to go to fixation in the strict sense of the word. 
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Figure 3: The one-frequency dynamics of the first {left) and fourth {right) loci of the UGA 

helps to go back to the "vertical views" presented in Figure [2] Observe that discounting the 
effect of sampling error, only selection and mutation have an effect on the composition of the 
bit-pool of each locu^ Crucially, crossover does not change the composition of these bit-pools. 
Now, let xi,X2, and 2:3 denote the indices of the pivotal loci of W-^^, and, without loss of gen- 
erality, suppose that the pivotal values of the first, second, and third pivotal loci are 0, 1, and 
respectively. Consider the frequency dynamics of the schema 010 of the site-array (xi, X2, X3). 
The probability of generating genomes of type 010 in some generation is highly (though not 
completely) dependent upon the composition of the bit-pools of the pivotal loci in the previous 
generation. A genome of type 010, once generated, will tend to be preferentially selected over 
all other genomes except those that belong to the "sibling" schema 101. Thus, regardless of 
what happens during crossover, once generated, a genome of type 010 will tend to increase the 
frequency of 0, 1, and in the bit-pools of the first, second, and third pivotal loci respectively. 
This makes conditions more favorable for the generation of genomes of type 010 in future gen- 
erations. Of course, the same argument applies to genomes of type 101. Now, given that the 
alleles 1 and are, in a sense, "rivals" of each other at each locus, the schemata 010 and 101 
"compete" for dominance of the bit-pools of each of the pivotal loci. One of these schemata 
eventually manages to gain an edge in "pulling" the composition of the bit-pools of all three 
pivotal loci far enough in it's favor that a self-reinforcing loop that heavily favors the future 
generation of the victorious schema then ensues. 

In light of this analysis, one can conclude that the building block hypothesis |12| [T9\ [T5] 
takes an overly-grim view of the disruption of fit low-order schemata with high defining- lengths. 
This view misses the fact that the "debris" from the disruption of such a schema changes the 

Changes in the one and zero frequencies of a locus can be visualized as changes in the composition of a pool of 
bits. The bit-pool metaphor is especially useful in conjunction with the "vertical view" of a population presented 



in figure 2(a) each column can be thought of as a pool of bits 
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Algorithm 1: CLASSlFYLoci_n 



Input: a type 1 or type 2 pivotal function / 
£=span of / 
pivotalLoci —{} 
nonPivotalLoci ={} 

P = population of W-^ after n generations 
for i = 1 to £ do 

X — one-frequency of locus i in population P 

if 0.1 < X and x < 0.9 then 

nonpivotalLoci — nonPivotalLoci U {«} 

else 

pivotalLoci — pivotalLoci U {i} 
end 
end 

return pivotalLoci, nonPivotalLoci 



composition of the bit-pools at the defining positions of that schema in a way that favors the 
future generation of genomes belonging to the schema. 

3.3 A Computational Competency of the SGA 

Consider Algorithm 1. The results of our experiment with W-l"' suggest that when 
ClassifyLoci_200 is applied to /i, it will classify each locus of /i fairly accurately. Let us 
quantify this accuracy. Note that in all 3000 runs of W^^ , by the end of the 200th generation, 
the one-frequency of the first locus was outside the interval [0.1, 0.9], and the one-frequency 
of the last locus was inside this interval. Let a be the probability that the one-frequency of 
the first locus of VF-^i will be inside [0.1 0.9] at the end of the 200th generation. Let Hq be 
the hypothesis that a > 0.005. If Hq is true then the probability that the one-frequency of the 
first locus will be outside [0.1, 0.9] at the end of the 200th generation in each of 3000 runs (as 
observed in the above experiment) is less than (1 — 0.005)'^^'^'^ < 3 x 10~^. Therefore we reject Hq 
at the 3 x 10"'' level of significance. Now consider the hypothesis that with probability greater 
than or equal to 0.005 the one-frequency of the last locus of W-^^ will be outside the interval 
[0.1,0.9] at the end of 200 generations. Using very similar reasoning we reject this hypothesis at 
the 3 X 10~^ level of significance. Thus, with probability of error less than three in ten million, 
the following statement is true: for any locus k of /i, there is less than a 0.005 probability that 
ClassifyLoci_200 will misclassify locus k. 

Note that £, may be any positive integer greater than 3. There are (g) G possible 
combinations of the three pivotal indice^ Remarkably, ClassifyLoci_200 achieves the level 
of robustness mentioned above [p < 0.005 per locus) in time that is linear in i, after making 
some number of fitness evaluations that is constant with respect to i 

^To obtain this bound we have used the inequaUty (n/k)^ < ('^). See |30| . 
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4 Type 2 Pivotal Functions 



If the order o of a type 1 pivotal function is greater than one, then even though no individual locus 
will have differentiated marginal effects, certain combinations of two loci, will have differentiated 
(multilocus) marginal effects (specifically those combinations in which both loci are pivotal). 
Thus for any o > 1, loci can be classified as pivotal or non-pivotal with a fixed level of robustness 
using a combinatorial DMT strategy that visits some number of two-locus combinations that is 
quadratic in the span of the pivotal functiorl^ Type 2 pivotal functions are expressly defined so 
that for any even order o, and any positive integer m < o, no combination of m loci will have 
differentiated marginal effects! 

Let © denote the exclusive-or operator (also the binary addition modulo 2 operator). Type 
2 pivotal functions are defined as follows: 

Definition 3. Let ijj = (o, cr, 5, i, L) be a 5-tuple such that o is a positive even integer, a and 5 
are positive real numbers, i is a positive integer greater than a, and L is an o-tuple of positive 
integers in [£] sorted in ascending order. A type 2 pivotal function with descriptor ip is a 
stochastic function f which behaves as follows: for any input bitstring g, if gii © . . . ® gio = 1 
then f returns a value drawn from M {6, a'^), otherwise f returns a value drawn from M{— 5, a'^). 

The order of a type 2 pivotal function is always even; furthermore, no pivotal values are 
associated with the pivotal loci. 

Let / be some type 2 pivotal function with span £. Observe that for any bitstring g of length 
fid) — fio)- Observe also that as © is associative and commutative, the order in which it 
is applied to the pivotal bits of some bitstring is immaterial. Both of these observations reveal 
symmetries of / that we will exploit forthwith. These symmetries can also be seen in Table [2] 
which shows the marginal of the pivotal loci of some type 2 pivotal function with increment 6, 
and order four. 

Finally observe that for any type 2 pivotal function with order o, the multilocus marginal 
of any combination of m < o alleles of distinct loci will not be differentiated. Thus the time 
complexity of a combinatorial DMT strategy that robustly identifies the pivotal loci is Q{£°). 

Let / be a type 2 pivotal function with descriptor (o = 4, 5 = 0.25, a = 1, I, L). We now 
show that a UGA can identify the pivotal loci of / relatively robustly (with less than a 0.005 
chance of misclassification per locus) in time that is linear in i, and with some number of queries 
that is constant with respect to i. Our approach is almost identical to the approach we took in 
Section |3] where we showed a similar result for a class of pivotal functions of type 1. 

4.1 Symmetry Analysis 

We define the basic form of a type 2 pivotal function as follows: 

^As o increases, the constant associated witli this scaling relationship will increase very quickly. Nevertheless 
for any fixed value of o, the number of combinations that must be visited scales quadratically with the span of a 
type 1 pivotal function. 
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Table 2: The expected marginal of the pivotal loci of a type 2 pivotal function 

Definition 4. Let f be some type 2 pivotal function with descriptor (o, b, a, £, L) . We define the 
basic form of f to be a type 2 pivotal function with descriptor (o, 5, u, o + 1, (1, . . . , o)) 

Let / be a type II pivotal function with descriptor (o, b, a, i, L, V). We say that / is basic if 
the basic form of /is /. Since the last two elements of the descriptor of / are derivable from the 
first three, we write this descriptor as {o,b,a). 

Proposition 3. Let U be a semi-parameterized UGA, let f be a type 2 pivotal function with 
descriptor (o, b, a, i, L, V), and let f* be the basic form of f. Then, for any generation t, 

(a) For any pivotal locus k, = i) 

(b) For any non-pivotal locus k, = 

This proposition is almost word-for-word identical to Proposition [2j Likewise the argument 
for this proposition, is very similar to the argument for Proposition [2] We omit this argument on 
the assumption that it will be clear to readers who have digested the argument for Proposition 

El 

By proposition |3|a), for any pivotal locus k of J/-^, drawing monte-carlo samples from 
is equivalent to drawing monte-carlo samples from And by proposition |3|b) , for any 
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non- pivotal locus k of U-^ , drawing monte-carlo samples from is equivalent to drawing 

monte-carlo samples from j^y 

4.2 Experiment 2 

Recall that W denotes the semi-parameterized UGA described in the materials and methods 
section in the appendix. Let /| denote a basic type 2 pivotal function with descriptor (o = 
4, 6 = 0.25, a = 1). We executed 3000 runs of the UGA . The one- frequency dynamics of 
the first and last loci in each run is plotted in Figure |4| In all 3000 runs the first locus went 
to fixation by generation 1000, whereas the one- frequency of the last locus in generation 1000 
was always between 0.9 and 0.1. We found that 1000 generations into each run the population 
was dominated by some schema 61626364 of the site-array (1, 2, 3, 4) with 61 © 62 © 63 © 64 = 1. 
On average the fraction of the population that belonged to the dominant schema in generation 
1000 was 0.9634 (with standard error 1.22 x lO""^). 

Let /2 be a type 2 pivotal function with span £ such that the basic form of /2 is /|. The 
conclusions of our symmetry analysis of type 2 pivotal fitness functions, and the result shown in 
Figure |4] provide us with a window into the frequency dynamics of all pivotal and non-pivotal 
loci of Wf^. 



4.3 A Second Computational Competency of the SGA 



Based on arguments that are almost identical to the ones in section 3.3 we conclude, with 
probability of error less than three in ten million, that the following statement is true: For any 
locus of /2, the probability that the locus will be misclassified by ClassifyLoci_1000 is less 
than a 0.005. There are (^) E possible configurations of the pivotal indices. Remarkably, 

ClassifyLoci_1000 achieves the level of robustness mentioned above {p < 0.005 per locus) in 
time that is linear in I, after making some number of fitness evaluations that is constant with 
respect to £. 

It merits mentioning that both the computational competencies showcased in this paper 
are essentially invisible to analytic approaches in which an infinite population is assumed (e.g. 
|32|[5lH]). This is because without some kind of symmetry breaking, the one and zero frequencies 
of the pivotal loci will not depart from 1/2. The role of symmetry breaking is performed here 
by sampling error which is absent in infinite population models of genetic algorithms. 



5 Conclusion 

This paper can be viewed as a response of sorts to the nihilism that the no free lunch theorems |33] 
have inspired. As mentioned in the introduction, no free lunch theorems are frequently regarded 
as "proof that in the practice of black-box optimization, one search algorithm is as good as 
another. If this is indeed the case, then genetic algorithms have no special advantage over other 
search algorithms; the fortuitous paring of optimization problem with search algorithm is all. 
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Figure 4: The one-frequency dynamics of the first (left) and fifth (right) loci of the UGA W-^^ 

Effort toward the identification of computational efficiencies underlying the genetic algorithm's 
capacity for practical blackbox optimization then seems misplaced. 

Nihilism of this sort is puzzling given the frequency with which genetic algorithms are suc- 
cessfully applied as black-box optimization algorithms in practice. The importance accorded 
to Wolpert and Mcready's results seems to be a reaction, or rather, an overreaction, to the 
empirical discovery of serious flaws [201 El [TH] in certain widely accepted statements about the 
computational efficiency of the simple genetic algorithm; these statements came to be regarded 
as fact because of an unclear demarcation between mathematical deduction and speculation in 
the early genetic algorithmics literature |6]. 

It is important to emphasize that the problem was not speculation itself but the blurring of 
the boundary between deduction and speculation. If the mistakes of the past are not to be revis- 
ited, this boundary must be clearly maintained at all times. Accordingly, we have distinguished 
between a computational competency of the SGA — a specific computational efficiency that can 
be derived deductiveljj^] — and a computational proficiency of the SGA — a general computational 
efficiency that is inferred inductively from a set of related computational competencies. 

In this paper we have derived two closely related computational competencies of the SGA. 
The general domain in which these competencies lie is noteworthy. A central explanandum of the 
field of genetic algorithmics, and indeed, of the field of evolutionary biology, is the persistence of 
adaptation in sexually evolving populations despite the ubiquity of epistatic interactions between 
unlinked genomic loci. That the SGA can, in particular cases, robustly and scalably "identify" 
small numbers of unlinked epistatically interacting loci with no main effects, and moreover, that 
the SGA does so by sending specific genotypes with above average fitness to fixation, is likely 

^We clarify here that we ofTer the symmetry arguments presented in this paper as deductive arguments, not 
speculative ones. These results have, of course, not been derived within some formal axiomatic system. However, 
to argue that this automatically makes our arguments non-deductive is to argue that deduction played a very 
limited role in mathematics before the nineteenth century. 
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to be material to theories about the adaptive prowess of both the genetic algorithm and natural 
evolution. 
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Materials and Methods 



The arguments in this paper rest not just on the qualitative behavior of the semi-parameterized 
UGA we used, but on aspects of it's quantitative behavior. It is therefore necessary to describe 
this UGA in sufficient detail that these aspects can be reproduced. While we believe that the 
description given below is sufficient to reproduce all quantitative aspects of the behavior of our 
UGA that are relevant to our proofs, small differences in implementation may interfere with 
reproducibility. We therefore release the Matlab code for the semi-parameterized UGA that 
we used. This code, along with the code for the pivotal fitness functions used is available for 
downloacEl 

The semi-parameterized UGA we used (denoted by W in this paper) implements the speci- 
fication for a simple genetic algorithm given by Mitchell |19, p 10], with two exceptions: 



1. In each generation, right after evaluating the fitness of all individuals, our UGA used 
sigma scaling p 167] to adjust the fitness of each individual, and used this adjusted 
fitness when selecting the parents of that generation. Suppose /i*^ is the fitness of some 
individual x in some generation t, and suppose the average fitness and standard deviation 
of the fitness of the individuals in generation t are given by and a^^^ respectively, then 
the adjusted fitness of x in generation t is given by hx^ where, if cr^*) = then hx^ = 1, 
otherwise, 

4*)=min(0,l+ -^" ) 

2. The SGA used universal stochastic stochastic sampling [3] |19l p 166] to select parents. 



Selection is fitness-proportionate, the population size is 1500, the probability of crossover is 
one, and bit-flip mutation with a mutation rate of 0.003 per bit is used. 



* http: //cs .brandeis . edu/~kekib/competenciesMatlab .zip 
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